Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebVoyager Baseline Agent & Benchmark #282

Open
wants to merge 54 commits into
base: main
Choose a base branch
from
Open

WebVoyager Baseline Agent & Benchmark #282

wants to merge 54 commits into from

Conversation

alckasoc
Copy link
Member

🤔 Reasoning

Explain the purpose of this PR...

🚧 Changes

Describe the changes made...

✅ PR Checklist

  • Using this PR template?
  • Linked issue?
  • Added feature?
    • Added/updated docs?
    • Added/updated tests?

@alckasoc alckasoc added enhancement New feature or request add-benchmark Adding support for a benchmark labels Jan 17, 2025
Copy link

codecov bot commented Jan 17, 2025

Codecov Report

Attention: Patch coverage is 8.13810% with 745 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...l/benchmarks/computer_use/webvoyager/webvoyager.py 0.00% 330 Missing ⚠️
...nchmarks/computer_use/webvoyager/utils_webarena.py 0.00% 187 Missing ⚠️
...benchmarks/computer_use/webvoyager/data_manager.py 0.00% 125 Missing ⚠️
...ential/benchmarks/computer_use/webvoyager/utils.py 0.00% 55 Missing ⚠️
...l/agents/computer_use/webvoyager_baseline/agent.py 0.00% 41 Missing ⚠️
...uter_use/webvoyager_baseline/strategies/general.py 89.39% 7 Missing ⚠️

❌ Your patch check has failed because the patch coverage (8.13%) is below the target coverage (95.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project check has failed because the head coverage (80.11%) is below the target coverage (95.00%). You can increase the head coverage or adjust the target coverage.

Files with missing lines Coverage Δ
...ter_use/webvoyager_baseline/functional_webarena.py 20.21% <100.00%> (+20.21%) ⬆️
.../agents/computer_use/webvoyager_baseline/output.py 100.00% <100.00%> (ø)
agential/agents/expel/prompts.py 100.00% <ø> (ø)
...al/benchmarks/computer_use/osworld/data_manager.py 96.19% <100.00%> (-0.96%) ⬇️
...gential/benchmarks/computer_use/osworld/osworld.py 21.66% <ø> (ø)
...uter_use/webvoyager_baseline/strategies/general.py 89.39% <89.39%> (ø)
...l/agents/computer_use/webvoyager_baseline/agent.py 0.00% <0.00%> (ø)
...ential/benchmarks/computer_use/webvoyager/utils.py 0.00% <0.00%> (ø)
...benchmarks/computer_use/webvoyager/data_manager.py 0.00% <0.00%> (ø)
...nchmarks/computer_use/webvoyager/utils_webarena.py 0.00% <0.00%> (ø)
... and 1 more

... and 2 files with indirect coverage changes

response = self.generate_thought(messages=messages, seed=seed)
prompt_tokens = response.prompt_tokens
completion_tokens = response.completion_tokens
gpt_4v_res = response.output_text
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we arent using openai specifically. use the LLM class

},
)

def reset( ######## Fix documentation #############
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are u absolutely sure there is no state within this agent?

Returns:
Response: The generated output text from the model.
"""
response = self.llm(messages, max_tokens=max_tokens, seed=seed, timeout=timeout)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

structure this into _prompt_* functions

Comment on lines +171 to +179
self,
system_prompt: str,
system_prompt_text_only: str,
seed: int,
max_attached_imgs: int,
temperature: float,
text_only: bool,
task: Dict[str, Any],
obs: Dict[str, Any]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some of these are hyperparameters. some of them are parameters for the generate method. some of them shouldnt even be parameters

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

system prompts should never need to be passed in (refer to all the agents we've implemented thus far)

also doesn't this baseline webvoyager agent have a state? it keeps a history of all the messages. doesn't it? does your implementation consider that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
add-benchmark Adding support for a benchmark enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants